-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(client): realtime client #29
Conversation
const data = JSON.parse(event.data); | ||
// Drop messages that are not related to the actual result. | ||
// In the future, we might want to handle other types of messages. | ||
if (data.status !== 'error' && data.type !== 'x-fal-message') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Does this drop messages or just throttles them (so can there be an eventual backlog of messages causing a laggy re-render)? In the server side we also keep a limited deque so in the case of the LCM app when the client sends the following 10 prompts before we can complete the current image generation process, we only keep the last 3 in memory and answer to them.
The server will never respond to requests 2 to 8, and this is by design (although maybe it can't be generalized). This is not a generic solution for all apps, but at least for the real time I think either some sort of a backpressure system or at the very least making the buffer limited might be helpful if it isn't already. Not for this PR, but something to think about maybe (since we already handle it on the server for the limited apps we have). |
libs/client/src/realtime.ts
Outdated
|
||
async function getConnection(app: string, key: string): Promise<WebSocket> { | ||
const url = builRealtimeUrl(app); | ||
// const token = await getToken(app); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we integrate (or enable) this, would this mean we have to make a new REST request (to generate a JWT token) even though we might have an unexpired token before every connection? Since we regularly close the underlying websockets, this might be a bit inefficient without some sort of caching to make sure we squeeze out as much as possible from individual JWT tokens (e.g. getToken with a TTL'd cache, if it is easy to implement) should help with perf a lot (in a world where re-connects happen often).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, once the wip impl is done, it will reuse the token as long as it's not expired.
This means the
This is exactly the side-effect of the current throttle implementation. :) cc @isidentical |
Amazing! |
Does this work with the pages router? Or just the app router? Getting 401 errors. |
Introducing new public API
fal.realtime.connect
In order to support the new realtime protocol built on top of WebSockets, this PR introduces a new client API. It allows developers to easily interact with model APIs that support realtime inference with very little effort, and also compatible with different JS runtimes and frameworks.
API signature + ergonomics
The API signature is as follows:
It is used as follows:
Implementation notes
Event throttling
One of the popular use-cases for LCM/realtime is to generate images as the user draws on a canvas and/or modifies a prompt. However, these events are triggered dozens, sometimes hundreds of times every second. So the client provides a built-in outgoing message throttling mechanism, that makes the experience smooth by default, without overloading the endpoint with too many requests.
Client+Server rendering
A popular architecture for front-end frameworks nowadays allows developers to share the same code when rendering pages on a server or on a client (e.g. Next, Remix, etc). This enables devs to create rich pages that come populate from the server but are also super interactive on the client. However, this comes with challenges, as developers need to be mindful of some components that are only suitable for client-side rendering or vice-versa. The client is mindful of such use-cases and provides a
clientOnly: boolean
option that whentrue
will ignore (i.e. no-op) when called on the server-side.Re-rendering cycles
Another common challenge in popular front-end architectures is to create components that can react to state whenever the frameworks decide to reconcile the data state and the UI representation of such state. This means that a component, that is simply a function, can be called multiple times during its lifecycle, and code that can trigger heavy-weight, long-running operations or manage resources need to be aware of that (e.g. WebSocket connections).
The client introduces a
connectionKey
option, that when passed makes the client reuse a WebSocket connection available for the same key. This allows devs to simply not worry about rendering cycles, asconnect
can be called multiple times during the rendering phase that no new connections/listeners/etc will be created.Here's what a full example of the client working on a Next.js page that does both ssr and csr:
Demo
The
demo-nextjs-app-router
now includes a "realtime" demo page that demonstrates the client usage:fal-realtime-connect-demo.mp4
Pending